Daghestanian loans database
Daghestanian loans database
- How to cite this project
- The database
- Map of the surveyed villages
- Sample lexical map
- Sources of lexical influence
- Cluster Dendrogram of Foreign Influence
- Cluster Dendrogram of Foreign Influence (Strict Distances)
- Mediation of Turkic influence (Speakers)
- Mediation of Turkic influence (Villages)
- Mediation of Total Turkic Influence
- Mediation of Standard Azerbaijani Influence
- Mediation of Turkic Influence via Major Languages
- References
Authors: Ilya Chechuro, Michael Daniel, and Samira Verhees.
The DagLoans Database, edited by Ilia Chechuro, Michael Daniel, Nina Dobrushina and Samira Verhees, is a scientific resource created by the Linguistic Convergence Laboratory of the HSE University, Moscow (2019). The project was prepared within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE) and supported within the framework of a subsidy by the Russian Academic Excellence Project ‘5-100’. This database contains wordlists collected as part of the Daghestanian loans project by the Linguistic Convergence Laboratory at NRU HSE. The aim of the 146-item shortlist, which is based on the World Loanword Database questionnaire, is to measure lexical contact on a micro-level. In other words, to quantify lexical convergence among the speech communities of minority languages on a village-level, and to detect fine-grained areal patterns beyond general observations on the spheres of influence of certain languages.
The database provides wordlists of 146 lexical meanings collected from 147 sources (dictionaries and speakers) of 23 languages. The set of the lexical meanings used in this database is a subset of the WOLD Database wordlist compiled by Haspelmath and Tadmor (2009) and uses the same lexical entries but a different set of IDs. The database can be linked to any other lexical database upon agreement with the authors. The language sample of the database includes both languages of Daghestan as well as geographically non-Daghestanian languages that are relevant for the study of lexical influence in this region such as Persian, Russian or Arabic. For now, the table shows source Concepts and target Words. Each target word is grouped in a similarity Set - a set of words that have the same meaning and look similar. In the future, data will be added on borrowing sources. Metadata includes the name of the Village where the word was recorded, the administrative District it is part of, the Language spoken there, and the List ID: these ID’s correspond to a particular speaker or in some cases a written source like a dictionary. The data are accessible at: Github/LingConLab/DagloanDatabase. The dataset in the dummy format is available here.
The Daghestanian Loans project studies the lexical influence of different languages in Daghestan on a microlevel, i.e. on the level of granularity that is sensitive to the difference between village varieties. Data from the project on multilingualism in Daghestan show that the conditions and the degree of language contact for each village are unique. Our aim is to discover the lexical correlates of these differences. For this purpose, we compiled a wordlist of 146 concepts for cross-linguistic comparison, and developed a method for quick data collection in the field. Using a fixed list of concepts for comparison allows us to find the quantitative correlates of qualitative differences between areas, such as the spread of a certain lingua franca, the presence and degree of contact with particular languages, as well as migratory processes.
Collecting data in neighboring villages allows us to show variation between villages on the map, and it reveals the contours of various zones of influence for specific L2s. For example, lexical influence of local Turkic languages (Azerbaijani, Kumyk and Nogai) is found throughout Daghestan. In the south, however, where Azerbaijani served as lingua franca for a long time, this influence is much stronger. In the north of Daghestan bilingualism with Turkic languages was not common, and almost all Turkic borrowings in minor local languages are shared with Avar, a major native language. Turkic influence in the north was thus most likely mediated by Avar. Our first paper (currently in the final stages of preparation) details how we can detect different zones by comparing lexical samples from villages and major neighboring languages.
Contents:
[,1]
target_words 25796
languages 23
How to cite this project
If you use data from the database in your research, please cite as follows:
Chechuro I., Daniel M., Dobrushina N., and Verhees S. 2019. Daghestanian loans database. Linguistic Convergence Laboratory, HSE. (Available online at https://lingconlab.github.io/Dagloan_database/DL_database.html, , accessed on June 05, 2019.)
The database
For now, the table shows source Concepts and target Words. Each target word is grouped in a similarity Set - a set of words that have the same meaning and look similar. In the future, data will be added on borrowing sources. Metadata includes the name of the Village where the word was recorded, the administrative District it is part of, the Language spoken there, and the List ID: these ID’s correspond to a particular speaker or in some cases a written source like a dictionary. Data is accessible at: Github/LingConLab/DagloanDatabase.
The dataset in the dummy format is available here.
Version: 2019-06-05. For questions or comments contact jh.verhees@gmail.com.
Map of the surveyed villages
Hover over and / or click on a dot on the map to know more. The color of the dots corresponds to the number of lists collected in a village. Orange = dictionary data.
Sample lexical map
The map below shows the distribution of different stems for the concept ‘pepper’.
Sources of lexical influence
Cluster Dendrogram of Foreign Influence
This tree is built as follows. 0 distance is given only to two matching non-empty cells, otherwise the distance is 1. The NA’s are not counted.
Speaker Language Village District Alibeglo1 Arkhit1 Arkhit2 Arkhit3
Arkhit4 Arkhit5 Arkhit6 Bezhta1 Darvag1 Darvag2 Darvag3 Darvag4
Darvag5 Darvag6 Dyubek1 Dyubek2 Dyubek3 Dyubek4 Dzhavgat1 Dzhavgat2
Dzhavgat3 Dzhavgat4 Dzhibakhni1 Dzhibakhni2 Dzhibakhni3 Dzhibakhni4
Helmets1 Helmets2 Helmets3 Ikhrek1 Ikhrek2 Ikhrek3 Ikhrek4 Ilisu1
Karata1 Karata2 Karata3 Karata4 Khapil1 Khapil2 Khapil3 Khapil4
Khapil5 Khiv1 Khiv2 Khiv3 Khiv4 Khlut1 Khlut2 Khlut3 Khlut4 Khlut5
Khoredzh1 Khoredzh2 Khoredzh3 Khoredzh4 Khoredzh5 Khoredzh6 Khutkhul1
Khutkhul2 Khutkhul3 Khutkhul4 Kiche1 Kiche2 Kidero1 Kidero2 Kidero3
Kina1 Kina2 Kina3 Kurag1 Kusur1 Laka1 Laka2 Laka3 Laka4 Laka5 Laka6
Meshabash1 Meshabash2 Mikik1 Mikik2 Qax1 Qax2 Qax3 Qax4 Qax5 Qax6
Qax7 Qax8 Qax9 Qum1 Qum2 Rikvani1 Rutul1 Tad-Magitl1 Tad-Magitl2
Tatil1 Tatil2 Tatil3 Tatil4 Tatil5 Tlibisho1 Tlibisho2 Tlibisho3
Tlibisho4 Tpig1 Tsinit1 Tsinit2 Tsinit3 Tsinit4 Tsinit5 Tukita1
Yagdyg1 Yagdyg2 Yagdyg3 Yagdyg4 Yagdyg5 Yagdyg6 Yersi1 Yersi2 Yersi3
Yersi4 Zilo1 Zilo2
[ reached 'max' / getOption("max.print") -- omitted 125 rows ]
Cluster Dendrogram of Foreign Influence (Strict Distances)
This tree is built as follows. 0 distance is given only to two matching non-empty cells, otherwise the distance is 1. This leads to the huge distances even if speakers are similar. The NA’s are counted.
Speaker Language Village District Alibeglo1 Arkhit1 Arkhit2 Arkhit3
Arkhit4 Arkhit5 Arkhit6 Bezhta1 Darvag1 Darvag2 Darvag3 Darvag4
Darvag5 Darvag6 Dyubek1 Dyubek2 Dyubek3 Dyubek4 Dzhavgat1 Dzhavgat2
Dzhavgat3 Dzhavgat4 Dzhibakhni1 Dzhibakhni2 Dzhibakhni3 Dzhibakhni4
Helmets1 Helmets2 Helmets3 Ikhrek1 Ikhrek2 Ikhrek3 Ikhrek4 Ilisu1
Karata1 Karata2 Karata3 Karata4 Khapil1 Khapil2 Khapil3 Khapil4
Khapil5 Khiv1 Khiv2 Khiv3 Khiv4 Khlut1 Khlut2 Khlut3 Khlut4 Khlut5
Khoredzh1 Khoredzh2 Khoredzh3 Khoredzh4 Khoredzh5 Khoredzh6 Khutkhul1
Khutkhul2 Khutkhul3 Khutkhul4 Kiche1 Kiche2 Kidero1 Kidero2 Kidero3
Kina1 Kina2 Kina3 Kurag1 Kusur1 Laka1 Laka2 Laka3 Laka4 Laka5 Laka6
Meshabash1 Meshabash2 Mikik1 Mikik2 Qax1 Qax2 Qax3 Qax4 Qax5 Qax6
Qax7 Qax8 Qax9 Qum1 Qum2 Rikvani1 Rutul1 Tad-Magitl1 Tad-Magitl2
Tatil1 Tatil2 Tatil3 Tatil4 Tatil5 Tlibisho1 Tlibisho2 Tlibisho3
Tlibisho4 Tpig1 Tsinit1 Tsinit2 Tsinit3 Tsinit4 Tsinit5 Tukita1
Yagdyg1 Yagdyg2 Yagdyg3 Yagdyg4 Yagdyg5 Yagdyg6 Yersi1 Yersi2 Yersi3
Yersi4 Zilo1 Zilo2
[ reached 'max' / getOption("max.print") -- omitted 125 rows ]
Mediation of Turkic influence (Speakers)
Mediation of Turkic influence (Villages)
Mediation of Total Turkic Influence
Mediation of Standard Azerbaijani Influence
Mediation of Turkic Influence via Major Languages
Speaker Language Village District Lexeme Present
1 Alibeglo1 Georgian Alibeglo Qax the_beeswax_9 0
2 Arkhit1 Lezgian Arkhit Khiv the_beeswax_9 0
3 Arkhit2 Lezgian Arkhit Khiv the_beeswax_9 0
4 Arkhit3 Lezgian Arkhit Khiv the_beeswax_9 0
5 Arkhit4 Lezgian Arkhit Khiv the_beeswax_9 0
6 Arkhit5 Lezgian Arkhit Khiv the_beeswax_9 0
References
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for “Grid” Graphics. https://CRAN.R-project.org/package=gridExtra.
Barnier, Julien. 2019. Rmdformats: HTML Output Formats and Templates for ’Rmarkdown’ Documents. https://CRAN.R-project.org/package=rmdformats.
Boettiger, Carl. 2017. Knitcitations: Citations for ’Knitr’ Markdown Files. https://CRAN.R-project.org/package=knitcitations.
Galili, Tal. 2015. “Dendextend: An R Package for Visualizing, Adjusting, and Comparing Trees of Hierarchical Clustering.” Bioinformatics. doi:10.1093/bioinformatics/btv428.
Gehlenborg, Nils. 2017. UpSetR: A More Scalable Alternative to Venn and Euler Diagrams for Visualizing Intersecting Sets. https://CRAN.R-project.org/package=UpSetR.
Haspelmath, Martin, and Uri Tadmor. 2009. Loanwords in the World’s Languages: A Comparative Handbook. Walter de Gruyter.
Moroz, George. 2017. Lingtypology: Easy Mapping for Linguistic Typology. https://CRAN.R-project.org/package=lingtypology.
R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sievert, Carson. 2018. Plotly for R. https://plotly-r.com.
Suzuki, Ryota, and Hidetoshi Shimodaira. 2015. Pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling. https://CRAN.R-project.org/package=pvclust.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC. http://www.crcpress.com/product/isbn/9781466561595.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.name/knitr/.
———. 2019. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2019. DT: A Wrapper of the Javascript Library ’Datatables’. https://CRAN.R-project.org/package=DT.